Multiscale Scattering for Audio Classification
نویسندگان
چکیده
Mel-frequency cepstral coefficients (MFCCs) are efficient audio descriptors providing spectral energy measurements over short time windows of length 23 ms. These measurements, however, lose non-stationary spectral information such as transients or time-varying structures. It is shown that this information can be recovered as spectral co-occurrence coefficients. Scattering operators compute these coefficients with a cascade of wavelet filter banks and modulus rectifiers. The signal can be reconstructed from scattering coefficients by inverting these wavelet modulus operators. An application to genre classification shows that second-order cooccurrence coefficients improve results obtained by MFCC and Delta-MFCC descriptors. 1
منابع مشابه
A Novel Noise-Robust Texture Classification Method Using Joint Multiscale LBP
In this paper we describe a novel noise-robust texture classification method using joint multiscale local binary pattern. The first step in texture classification is to describe the texture by extracting different features. So far, several methods have been developed for this topic, one of the most popular ones is Local Binary Pattern (LBP) method and its variants such as Completed Local Binary...
متن کاملDetection of Melanoma Skin Cancer by Elastic Scattering Spectra: A Proposed Classification Method
Introduction: There is a strong need for developing clinical technologies and instruments for prompt tissue assessment in a variety of oncological applications as smart methods. Elastic scattering spectroscopy (ESS) is a real-time, noninvasive, point-measurement, optical diagnostic technique for malignancy detection through changes at cellular and subcellular levels, especially important in ear...
متن کاملPairwise Decomposition with Deep Neural Networks and Multiscale Kernel Subspace Learning for Acoustic Scene Classification
We propose a system for acoustic scene classification using pairwise decomposition with deep neural networks and dimensionality reduction by multiscale kernel subspace learning. It is our contribution to the Acoustic Scene Classification task of the IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE2016). The system classifies 15 different acoustic scenes. ...
متن کاملListening to the World Improves Speech Command Recognition
We study transfer learning in convolutional network architectures applied to the task of recognizing audio, such as environmental sound events and speech commands. Our key finding is that not only is it possible to transfer representations from an unrelated task like environmental sound classification to a voice-focused task like speech command recognition, but also that doing so improves accur...
متن کاملMultiscale Sparse Microcanonical Models
We study density estimation of stationary processes defined over an infinite grid from a single, finite realization. Gaussian Processes and Markov Random Fields avoid the curse of dimensionality by focusing on low-order and localized potentials respectively, but its application to complex datasets is limited by their inability to capture singularities and long-range interactions, and their expe...
متن کامل